Overview

Brought to you by YData

Dataset statistics

Number of variables20
Number of observations50000
Missing cells50168
Missing cells (%)5.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory33.7 MiB
Average record size in memory707.2 B

Variable types

Text4
Categorical5
Numeric11

Alerts

Aromaticity is highly overall correlated with Oxidized_coefficient and 1 other fieldsHigh correlation
Function_Prediction_source is highly overall correlated with Protein_sourceHigh correlation
Function_prediction_source is highly overall correlated with Phage_source and 1 other fieldsHigh correlation
Molecular_weight is highly overall correlated with Oxidized_coefficient and 1 other fieldsHigh correlation
Oxidized_coefficient is highly overall correlated with Aromaticity and 2 other fieldsHigh correlation
Phage_source is highly overall correlated with Function_prediction_source and 1 other fieldsHigh correlation
Protein_source is highly overall correlated with Function_Prediction_source and 2 other fieldsHigh correlation
Reduced_coefficient is highly overall correlated with Aromaticity and 2 other fieldsHigh correlation
Start is highly overall correlated with StopHigh correlation
Stop is highly overall correlated with StartHigh correlation
Protein_source is highly imbalanced (94.2%) Imbalance
Function_prediction_source has 22870 (45.7%) missing values Missing
Function_Prediction_source has 27130 (54.3%) missing values Missing
Protein_ID has unique values Unique
Aromaticity has 8261 (16.5%) zeros Zeros
Instability_index has 836 (1.7%) zeros Zeros
Helix_fraction has 2231 (4.5%) zeros Zeros
Turn_fraction has 2985 (6.0%) zeros Zeros
Sheet_fraction has 2367 (4.7%) zeros Zeros
Reduced_coefficient has 13696 (27.4%) zeros Zeros
Oxidized_coefficient has 13220 (26.4%) zeros Zeros

Reproduction

Analysis started2025-07-29 12:15:02.726461
Analysis finished2025-07-29 12:15:20.462748
Duration17.74 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct47845
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Memory size4.4 MiB
2025-07-29T14:15:20.600554image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length87
Median length85
Mean length34.65984
Min length5

Characters and Unicode

Total characters1732992
Distinct characters67
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45794 ?
Unique (%)91.6%

Sample

1st rowNC_011019.1
2nd rowNC_008723.1
3rd rowNC_017968.1
4th rowNC_009820.1
5th rowNC_018863.1
ValueCountFrequency (%)
imgvr_uvig_3300008299_000009|3300008299|ga0114868_1000024 4
 
< 0.1%
station168_dcm_all_assembly_node_569_length_94681_cov_11.157230 4
 
< 0.1%
imgvr_uvig_3300045988_102218|3300045988|ga0495776_017886 4
 
< 0.1%
mgv-genome-0378315 4
 
< 0.1%
mycobacterium_phage_porcelain 4
 
< 0.1%
nc_030936.1 4
 
< 0.1%
imgvr_uvig_3300029604_000307|3300029604|ga0245147_100033|79620-241832 4
 
< 0.1%
mgv-genome-0380120 4
 
< 0.1%
uvig_555935 3
 
< 0.1%
station180_zzz_all_assembly_node_179_length_172350_cov_80.433280 3
 
< 0.1%
Other values (47835) 49962
99.9%
2025-07-29T14:15:20.867141image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 189725
 
10.9%
_ 138641
 
8.0%
3 107075
 
6.2%
1 90650
 
5.2%
2 84127
 
4.9%
8 82152
 
4.7%
5 80230
 
4.6%
4 78857
 
4.6%
9 73716
 
4.3%
7 70518
 
4.1%
Other values (57) 737301
42.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1732992
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 189725
 
10.9%
_ 138641
 
8.0%
3 107075
 
6.2%
1 90650
 
5.2%
2 84127
 
4.9%
8 82152
 
4.7%
5 80230
 
4.6%
4 78857
 
4.6%
9 73716
 
4.3%
7 70518
 
4.1%
Other values (57) 737301
42.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1732992
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 189725
 
10.9%
_ 138641
 
8.0%
3 107075
 
6.2%
1 90650
 
5.2%
2 84127
 
4.9%
8 82152
 
4.7%
5 80230
 
4.6%
4 78857
 
4.6%
9 73716
 
4.3%
7 70518
 
4.1%
Other values (57) 737301
42.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1732992
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 189725
 
10.9%
_ 138641
 
8.0%
3 107075
 
6.2%
1 90650
 
5.2%
2 84127
 
4.9%
8 82152
 
4.7%
5 80230
 
4.6%
4 78857
 
4.6%
9 73716
 
4.3%
7 70518
 
4.1%
Other values (57) 737301
42.5%

Protein_source
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
prodigal
49210 
RefSeq
 
521
Genbank
 
242
DDBJ
 
20
EMBL
 
7

Length

Max length8
Median length8
Mean length7.97216
Min length4

Characters and Unicode

Total characters398608
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
prodigal 49210
98.4%
RefSeq 521
 
1.0%
Genbank 242
 
0.5%
DDBJ 20
 
< 0.1%
EMBL 7
 
< 0.1%

Length

2025-07-29T14:15:20.976030image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-29T14:15:21.069015image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
prodigal 49210
98.4%
refseq 521
 
1.0%
genbank 242
 
0.5%
ddbj 20
 
< 0.1%
embl 7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a 49452
12.4%
r 49210
12.3%
p 49210
12.3%
o 49210
12.3%
d 49210
12.3%
i 49210
12.3%
g 49210
12.3%
l 49210
12.3%
e 1284
 
0.3%
R 521
 
0.1%
Other values (13) 2881
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 398608
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 49452
12.4%
r 49210
12.3%
p 49210
12.3%
o 49210
12.3%
d 49210
12.3%
i 49210
12.3%
g 49210
12.3%
l 49210
12.3%
e 1284
 
0.3%
R 521
 
0.1%
Other values (13) 2881
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 398608
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 49452
12.4%
r 49210
12.3%
p 49210
12.3%
o 49210
12.3%
d 49210
12.3%
i 49210
12.3%
g 49210
12.3%
l 49210
12.3%
e 1284
 
0.3%
R 521
 
0.1%
Other values (13) 2881
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 398608
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 49452
12.4%
r 49210
12.3%
p 49210
12.3%
o 49210
12.3%
d 49210
12.3%
i 49210
12.3%
g 49210
12.3%
l 49210
12.3%
e 1284
 
0.3%
R 521
 
0.1%
Other values (13) 2881
 
0.7%

Function_prediction_source
Categorical

High correlation  Missing 

Distinct7
Distinct (%)< 0.1%
Missing22870
Missing (%)45.7%
Memory size3.0 MiB
eggNOG-mapper
10992 
Iterative search
9898 
-
5450 
RefSeq
 
521
Genbank
 
242
Other values (2)
 
27

Length

Max length16
Median length13
Mean length11.486989
Min length1

Characters and Unicode

Total characters311642
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
eggNOG-mapper 10992
22.0%
Iterative search 9898
19.8%
- 5450
 
10.9%
RefSeq 521
 
1.0%
Genbank 242
 
0.5%
DDBJ 20
 
< 0.1%
EMBL 7
 
< 0.1%
(Missing) 22870
45.7%

Length

2025-07-29T14:15:21.182920image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-29T14:15:21.546213image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
eggnog-mapper 10992
29.7%
iterative 9898
26.7%
search 9898
26.7%
5450
14.7%
refseq 521
 
1.4%
genbank 242
 
0.7%
ddbj 20
 
0.1%
embl 7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 52962
17.0%
a 31030
 
10.0%
r 30788
 
9.9%
g 21984
 
7.1%
p 21984
 
7.1%
t 19796
 
6.4%
- 16442
 
5.3%
G 11234
 
3.6%
m 10992
 
3.5%
N 10992
 
3.5%
Other values (21) 83438
26.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 311642
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 52962
17.0%
a 31030
 
10.0%
r 30788
 
9.9%
g 21984
 
7.1%
p 21984
 
7.1%
t 19796
 
6.4%
- 16442
 
5.3%
G 11234
 
3.6%
m 10992
 
3.5%
N 10992
 
3.5%
Other values (21) 83438
26.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 311642
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 52962
17.0%
a 31030
 
10.0%
r 30788
 
9.9%
g 21984
 
7.1%
p 21984
 
7.1%
t 19796
 
6.4%
- 16442
 
5.3%
G 11234
 
3.6%
m 10992
 
3.5%
N 10992
 
3.5%
Other values (21) 83438
26.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 311642
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 52962
17.0%
a 31030
 
10.0%
r 30788
 
9.9%
g 21984
 
7.1%
p 21984
 
7.1%
t 19796
 
6.4%
- 16442
 
5.3%
G 11234
 
3.6%
m 10992
 
3.5%
N 10992
 
3.5%
Other values (21) 83438
26.8%

Start
Real number (ℝ)

High correlation 

Distinct34302
Distinct (%)68.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29069.426
Minimum1
Maximum448958
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:21.672868image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1312.95
Q18944.75
median20886
Q337679.25
95-th percentile87545.15
Maximum448958
Range448957
Interquartile range (IQR)28734.5

Descriptive statistics

Standard deviation31133.898
Coefficient of variation (CV)1.0710187
Kurtosis14.952845
Mean29069.426
Median Absolute Deviation (MAD)13465.5
Skewness2.9461042
Sum1.4534713 × 109
Variance9.6931959 × 108
MonotonicityNot monotonic
2025-07-29T14:15:21.787443image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 216
 
0.4%
3 179
 
0.4%
2 178
 
0.4%
50 30
 
0.1%
61 12
 
< 0.1%
90 9
 
< 0.1%
4015 8
 
< 0.1%
1416 7
 
< 0.1%
2109 7
 
< 0.1%
22448 7
 
< 0.1%
Other values (34292) 49347
98.7%
ValueCountFrequency (%)
1 216
0.4%
2 178
0.4%
3 179
0.4%
6 2
 
< 0.1%
8 1
 
< 0.1%
10 1
 
< 0.1%
11 1
 
< 0.1%
12 1
 
< 0.1%
13 2
 
< 0.1%
14 3
 
< 0.1%
ValueCountFrequency (%)
448958 1
< 0.1%
414661 1
< 0.1%
397002 1
< 0.1%
380976 1
< 0.1%
367073 1
< 0.1%
365881 1
< 0.1%
357531 1
< 0.1%
345250 1
< 0.1%
343402 1
< 0.1%
341337 1
< 0.1%

Stop
Real number (ℝ)

High correlation 

Distinct34692
Distinct (%)69.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29759.046
Minimum65
Maximum449674
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:21.898888image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum65
5-th percentile1974.95
Q19667.25
median21581.5
Q338343.25
95-th percentile88182.65
Maximum449674
Range449609
Interquartile range (IQR)28676

Descriptive statistics

Standard deviation31131.486
Coefficient of variation (CV)1.0461184
Kurtosis14.975301
Mean29759.046
Median Absolute Deviation (MAD)13463
Skewness2.9470997
Sum1.4879523 × 109
Variance9.6916945 × 108
MonotonicityNot monotonic
2025-07-29T14:15:21.994529image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1224 11
 
< 0.1%
12072 8
 
< 0.1%
3015 7
 
< 0.1%
5764 7
 
< 0.1%
11033 7
 
< 0.1%
1537 7
 
< 0.1%
2589 7
 
< 0.1%
9848 6
 
< 0.1%
10704 6
 
< 0.1%
14580 6
 
< 0.1%
Other values (34682) 49928
99.9%
ValueCountFrequency (%)
65 1
 
< 0.1%
66 1
 
< 0.1%
69 1
 
< 0.1%
71 2
< 0.1%
72 3
< 0.1%
73 1
 
< 0.1%
75 1
 
< 0.1%
78 1
 
< 0.1%
79 1
 
< 0.1%
81 2
< 0.1%
ValueCountFrequency (%)
449674 1
< 0.1%
415482 1
< 0.1%
398222 1
< 0.1%
382007 1
< 0.1%
368425 1
< 0.1%
366084 1
< 0.1%
358202 1
< 0.1%
346206 1
< 0.1%
346113 1
< 0.1%
342080 1
< 0.1%

Strand
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
-
25109 
+
24891 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters50000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row+
3rd row-
4th row+
5th row-

Common Values

ValueCountFrequency (%)
- 25109
50.2%
+ 24891
49.8%

Length

2025-07-29T14:15:22.133008image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-29T14:15:22.250511image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
50000
100.0%

Most occurring characters

ValueCountFrequency (%)
- 25109
50.2%
+ 24891
49.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 50000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 25109
50.2%
+ 24891
49.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 50000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 25109
50.2%
+ 24891
49.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 50000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 25109
50.2%
+ 24891
49.8%

Protein_ID
Text

Unique 

Distinct50000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-07-29T14:15:22.397141image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length91
Median length87
Mean length37.52778
Min length7

Characters and Unicode

Total characters1876389
Distinct characters67
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50000 ?
Unique (%)100.0%

Sample

1st rowYP_001994544.1
2nd rowYP_950675.1
3rd rowYP_006382285.1
4th rowYP_001469324.1
5th rowYP_006908410.1
ValueCountFrequency (%)
np_958633.1 1
 
< 0.1%
biochar_1064_29 1
 
< 0.1%
yp_001994544.1 1
 
< 0.1%
yp_950675.1 1
 
< 0.1%
yp_006382285.1 1
 
< 0.1%
yp_001469324.1 1
 
< 0.1%
yp_006908410.1 1
 
< 0.1%
yp_008052003.1 1
 
< 0.1%
yp_007010876.1 1
 
< 0.1%
yp_003969626.1 1
 
< 0.1%
Other values (49990) 49990
> 99.9%
2025-07-29T14:15:22.631058image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 195477
 
10.4%
_ 187851
 
10.0%
3 118669
 
6.3%
1 108197
 
5.8%
2 97637
 
5.2%
4 89134
 
4.8%
5 89042
 
4.7%
8 88366
 
4.7%
9 79630
 
4.2%
7 77236
 
4.1%
Other values (57) 745150
39.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1876389
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 195477
 
10.4%
_ 187851
 
10.0%
3 118669
 
6.3%
1 108197
 
5.8%
2 97637
 
5.2%
4 89134
 
4.8%
5 89042
 
4.7%
8 88366
 
4.7%
9 79630
 
4.2%
7 77236
 
4.1%
Other values (57) 745150
39.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1876389
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 195477
 
10.4%
_ 187851
 
10.0%
3 118669
 
6.3%
1 108197
 
5.8%
2 97637
 
5.2%
4 89134
 
4.8%
5 89042
 
4.7%
8 88366
 
4.7%
9 79630
 
4.2%
7 77236
 
4.1%
Other values (57) 745150
39.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1876389
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 195477
 
10.4%
_ 187851
 
10.0%
3 118669
 
6.3%
1 108197
 
5.8%
2 97637
 
5.2%
4 89134
 
4.8%
5 89042
 
4.7%
8 88366
 
4.7%
9 79630
 
4.2%
7 77236
 
4.1%
Other values (57) 745150
39.7%
Distinct4022
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
2025-07-29T14:15:22.785053image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length902
Median length761
Mean length25.97064
Min length3

Characters and Unicode

Total characters1298532
Distinct characters80
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1782 ?
Unique (%)3.6%

Sample

1st rowhypothetical protein
2nd rowmajor tail protein
3rd rowmajor capsid protein
4th rowhypothetical protein
5th rowRNA polymerase sigma factor
ValueCountFrequency (%)
unknown 19689
 
11.8%
protein 12700
 
7.6%
of 4730
 
2.8%
hypothetical 4383
 
2.6%
the 3883
 
2.3%
domain 3698
 
2.2%
phage 3190
 
1.9%
family 2911
 
1.7%
dna 2844
 
1.7%
to 1995
 
1.2%
Other values (5284) 107471
64.2%
2025-07-29T14:15:23.101890image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 128202
 
9.9%
117511
 
9.0%
e 100221
 
7.7%
o 99374
 
7.7%
i 88001
 
6.8%
t 82210
 
6.3%
a 75139
 
5.8%
r 57074
 
4.4%
s 49455
 
3.8%
l 47447
 
3.7%
Other values (70) 453898
35.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1298532
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 128202
 
9.9%
117511
 
9.0%
e 100221
 
7.7%
o 99374
 
7.7%
i 88001
 
6.8%
t 82210
 
6.3%
a 75139
 
5.8%
r 57074
 
4.4%
s 49455
 
3.8%
l 47447
 
3.7%
Other values (70) 453898
35.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1298532
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 128202
 
9.9%
117511
 
9.0%
e 100221
 
7.7%
o 99374
 
7.7%
i 88001
 
6.8%
t 82210
 
6.3%
a 75139
 
5.8%
r 57074
 
4.4%
s 49455
 
3.8%
l 47447
 
3.7%
Other values (70) 453898
35.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1298532
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 128202
 
9.9%
117511
 
9.0%
e 100221
 
7.7%
o 99374
 
7.7%
i 88001
 
6.8%
t 82210
 
6.3%
a 75139
 
5.8%
r 57074
 
4.4%
s 49455
 
3.8%
l 47447
 
3.7%
Other values (70) 453898
35.0%
Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
2025-07-29T14:15:23.232835image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length45
Median length9
Mean length10.45058
Min length6

Characters and Unicode

Total characters522529
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowhypothetical;
2nd rowinfection;
3rd rowassembly;
4th rowhypothetical;
5th rowreplication;
ValueCountFrequency (%)
unsorted 27503
55.0%
hypothetical 4380
 
8.8%
assembly 3676
 
7.4%
replication 2426
 
4.9%
infection 1978
 
4.0%
packaging 1754
 
3.5%
lysis 1446
 
2.9%
assembly;infection 1434
 
2.9%
integration 1162
 
2.3%
regulation 1080
 
2.2%
Other values (55) 3161
 
6.3%
2025-07-29T14:15:23.504742image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
; 53910
10.3%
e 50139
9.6%
t 49592
9.5%
n 47148
9.0%
o 43303
 
8.3%
s 42419
 
8.1%
r 35398
 
6.8%
u 30714
 
5.9%
i 29793
 
5.7%
d 27676
 
5.3%
Other values (15) 112437
21.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 522529
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
; 53910
10.3%
e 50139
9.6%
t 49592
9.5%
n 47148
9.0%
o 43303
 
8.3%
s 42419
 
8.1%
r 35398
 
6.8%
u 30714
 
5.9%
i 29793
 
5.7%
d 27676
 
5.3%
Other values (15) 112437
21.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 522529
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
; 53910
10.3%
e 50139
9.6%
t 49592
9.5%
n 47148
9.0%
o 43303
 
8.3%
s 42419
 
8.1%
r 35398
 
6.8%
u 30714
 
5.9%
i 29793
 
5.7%
d 27676
 
5.3%
Other values (15) 112437
21.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 522529
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
; 53910
10.3%
e 50139
9.6%
t 49592
9.5%
n 47148
9.0%
o 43303
 
8.3%
s 42419
 
8.1%
r 35398
 
6.8%
u 30714
 
5.9%
i 29793
 
5.7%
d 27676
 
5.3%
Other values (15) 112437
21.5%

Molecular_weight
Real number (ℝ)

High correlation 

Distinct44452
Distinct (%)89.1%
Missing84
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean4149.2137
Minimum75.0666
Maximum8770.8033
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:23.612249image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum75.0666
5-th percentile417.4176
Q12048.0349
median4220.8564
Q36254.9195
95-th percentile7694.0663
Maximum8770.8033
Range8695.7367
Interquartile range (IQR)4206.8846

Descriptive statistics

Standard deviation2375.7274
Coefficient of variation (CV)0.57257293
Kurtosis-1.243662
Mean4149.2137
Median Absolute Deviation (MAD)2099.4409
Skewness-0.052646883
Sum2.0711215 × 108
Variance5644080.9
MonotonicityNot monotonic
2025-07-29T14:15:23.737946image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
131.1729 111
 
0.2%
146.1876 105
 
0.2%
147.1293 78
 
0.2%
89.0932 76
 
0.2%
174.201 58
 
0.1%
75.0666 56
 
0.1%
105.0926 48
 
0.1%
146.1445 46
 
0.1%
117.1463 45
 
0.1%
245.2755 43
 
0.1%
Other values (44442) 49250
98.5%
(Missing) 84
 
0.2%
ValueCountFrequency (%)
75.0666 56
0.1%
89.0932 76
0.2%
105.0926 48
0.1%
115.1305 9
 
< 0.1%
117.1463 45
0.1%
119.1192 28
 
0.1%
121.1582 5
 
< 0.1%
131.1729 111
0.2%
132.1179 38
 
0.1%
133.1027 41
 
0.1%
ValueCountFrequency (%)
8770.8033 1
< 0.1%
8690.9637 1
< 0.1%
8669.5906 1
< 0.1%
8665.8902 1
< 0.1%
8665.857 1
< 0.1%
8662.5385 1
< 0.1%
8639.6324 1
< 0.1%
8638.6479 1
< 0.1%
8632.7731 1
< 0.1%
8627.7922 1
< 0.1%

Aromaticity
Real number (ℝ)

High correlation  Zeros 

Distinct472
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.089780236
Minimum0
Maximum1
Zeros8261
Zeros (%)16.5%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:24.162813image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.041666667
median0.083333333
Q30.125
95-th percentile0.2
Maximum1
Range1
Interquartile range (IQR)0.083333333

Descriptive statistics

Standard deviation0.079517352
Coefficient of variation (CV)0.88568883
Kurtosis32.038142
Mean0.089780236
Median Absolute Deviation (MAD)0.041666667
Skewness3.596842
Sum4489.0118
Variance0.0063230093
MonotonicityNot monotonic
2025-07-29T14:15:24.251270image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8261
 
16.5%
0.1428571429 1055
 
2.1%
0.1111111111 970
 
1.9%
0.09090909091 967
 
1.9%
0.125 954
 
1.9%
0.1 952
 
1.9%
0.07692307692 800
 
1.6%
0.08333333333 799
 
1.6%
0.1666666667 774
 
1.5%
0.07142857143 703
 
1.4%
Other values (462) 33765
67.5%
ValueCountFrequency (%)
0 8261
16.5%
0.01428571429 22
 
< 0.1%
0.01449275362 25
 
0.1%
0.01470588235 19
 
< 0.1%
0.01492537313 22
 
< 0.1%
0.01515151515 18
 
< 0.1%
0.01538461538 34
 
0.1%
0.015625 17
 
< 0.1%
0.01587301587 18
 
< 0.1%
0.01612903226 29
 
0.1%
ValueCountFrequency (%)
1 86
0.2%
0.75 2
 
< 0.1%
0.6666666667 14
 
< 0.1%
0.625 1
 
< 0.1%
0.6 8
 
< 0.1%
0.5 158
0.3%
0.4736842105 1
 
< 0.1%
0.4545454545 1
 
< 0.1%
0.4444444444 1
 
< 0.1%
0.4285714286 14
 
< 0.1%

Instability_index
Real number (ℝ)

Zeros 

Distinct39281
Distinct (%)78.7%
Missing84
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean35.581297
Minimum-86.5
Maximum388.53333
Zeros836
Zeros (%)1.7%
Negative3261
Negative (%)6.5%
Memory size390.8 KiB
2025-07-29T14:15:24.339951image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum-86.5
5-th percentile-3.9905
Q117.655254
median33.759336
Q350.438469
95-th percentile82.593357
Maximum388.53333
Range475.03333
Interquartile range (IQR)32.783216

Descriptive statistics

Standard deviation29.032043
Coefficient of variation (CV)0.8159355
Kurtosis5.7399298
Mean35.581297
Median Absolute Deviation (MAD)16.372123
Skewness1.1322243
Sum1776076
Variance842.85954
MonotonicityNot monotonic
2025-07-29T14:15:24.507352image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 836
 
1.7%
5 510
 
1.0%
6.666666667 321
 
0.6%
7.5 181
 
0.4%
8 138
 
0.3%
-13.725 94
 
0.2%
8.333333333 84
 
0.2%
55.65 82
 
0.2%
-37.45 79
 
0.2%
-21.63333333 78
 
0.2%
Other values (39271) 47513
95.0%
(Missing) 84
 
0.2%
ValueCountFrequency (%)
-86.5 1
 
< 0.1%
-79.55 1
 
< 0.1%
-72.525 4
 
< 0.1%
-71.73333333 4
 
< 0.1%
-70.15 21
< 0.1%
-69.1 2
 
< 0.1%
-68.56666667 2
 
< 0.1%
-67.65 1
 
< 0.1%
-65.3 1
 
< 0.1%
-60.6625 1
 
< 0.1%
ValueCountFrequency (%)
388.5333333 1
 
< 0.1%
306.2666667 1
 
< 0.1%
299.6 1
 
< 0.1%
291.4 7
< 0.1%
289.8571429 1
 
< 0.1%
264.8 1
 
< 0.1%
261.8 6
< 0.1%
242.6222222 1
 
< 0.1%
240.85 1
 
< 0.1%
231.725 1
 
< 0.1%

Isoelectric_point
Real number (ℝ)

Distinct19776
Distinct (%)39.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.8118843
Minimum4.0500284
Maximum11.999968
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:24.656526image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum4.0500284
5-th percentile4.0500284
Q14.6111429
median6.0682673
Q39.1348639
95-th percentile10.655415
Maximum11.999968
Range7.9499393
Interquartile range (IQR)4.5237209

Descriptive statistics

Standard deviation2.3625355
Coefficient of variation (CV)0.34682554
Kurtosis-1.2634447
Mean6.8118843
Median Absolute Deviation (MAD)1.9033197
Skewness0.41950469
Sum340594.22
Variance5.5815738
MonotonicityNot monotonic
2025-07-29T14:15:24.798291image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.050028419 4503
 
9.0%
5.525000191 838
 
1.7%
11.99996777 551
 
1.1%
8.750052071 448
 
0.9%
9.750021172 216
 
0.4%
5.57001667 181
 
0.4%
11.00083675 155
 
0.3%
5.240009499 141
 
0.3%
4.370259285 141
 
0.3%
5.494989204 138
 
0.3%
Other values (19766) 42688
85.4%
ValueCountFrequency (%)
4.050028419 4503
9.0%
4.051619911 1
 
< 0.1%
4.052074623 1
 
< 0.1%
4.05224514 1
 
< 0.1%
4.052586174 1
 
< 0.1%
4.052756691 1
 
< 0.1%
4.052984047 1
 
< 0.1%
4.053097725 2
 
< 0.1%
4.053211403 1
 
< 0.1%
4.053268242 1
 
< 0.1%
ValueCountFrequency (%)
11.99996777 551
1.1%
11.94213963 1
 
< 0.1%
11.936273 2
 
< 0.1%
11.93047085 1
 
< 0.1%
11.92453976 1
 
< 0.1%
11.91706142 1
 
< 0.1%
11.91042118 1
 
< 0.1%
11.90887394 2
 
< 0.1%
11.90784245 1
 
< 0.1%
11.90552158 1
 
< 0.1%

Helix_fraction
Real number (ℝ)

Zeros 

Distinct1179
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.29488597
Minimum0
Maximum1
Zeros2231
Zeros (%)4.5%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:24.926790image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.074074074
Q10.23684211
median0.2962963
Q30.35294118
95-th percentile0.49056604
Maximum1
Range1
Interquartile range (IQR)0.11609907

Descriptive statistics

Standard deviation0.12564571
Coefficient of variation (CV)0.42608236
Kurtosis6.2035764
Mean0.29488597
Median Absolute Deviation (MAD)0.057549858
Skewness0.90052166
Sum14744.299
Variance0.015786845
MonotonicityNot monotonic
2025-07-29T14:15:25.045080image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3333333333 2510
 
5.0%
0 2231
 
4.5%
0.25 1581
 
3.2%
0.2857142857 1170
 
2.3%
0.5 1046
 
2.1%
0.2 809
 
1.6%
0.3 793
 
1.6%
0.4 694
 
1.4%
0.2727272727 567
 
1.1%
0.3076923077 552
 
1.1%
Other values (1169) 38047
76.1%
ValueCountFrequency (%)
0 2231
4.5%
0.01449275362 1
 
< 0.1%
0.015625 1
 
< 0.1%
0.01639344262 1
 
< 0.1%
0.01851851852 1
 
< 0.1%
0.02040816327 2
 
< 0.1%
0.02173913043 1
 
< 0.1%
0.02222222222 4
 
< 0.1%
0.02272727273 3
 
< 0.1%
0.025 2
 
< 0.1%
ValueCountFrequency (%)
1 314
0.6%
0.9090909091 1
 
< 0.1%
0.875 3
 
< 0.1%
0.8571428571 1
 
< 0.1%
0.8571428571 1
 
< 0.1%
0.8333333333 2
 
< 0.1%
0.8 10
 
< 0.1%
0.75 45
 
0.1%
0.7333333333 1
 
< 0.1%
0.7272727273 1
 
< 0.1%

Turn_fraction
Real number (ℝ)

Zeros 

Distinct904
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2061058
Minimum0
Maximum1
Zeros2985
Zeros (%)6.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:25.213498image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.14285714
median0.2
Q30.25531915
95-th percentile0.37931034
Maximum1
Range1
Interquartile range (IQR)0.11246201

Descriptive statistics

Standard deviation0.11392829
Coefficient of variation (CV)0.55276607
Kurtosis9.5106599
Mean0.2061058
Median Absolute Deviation (MAD)0.057142857
Skewness1.7315084
Sum10305.29
Variance0.012979656
MonotonicityNot monotonic
2025-07-29T14:15:25.418264image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2985
 
6.0%
0.25 1643
 
3.3%
0.2 1547
 
3.1%
0.1666666667 1323
 
2.6%
0.3333333333 1232
 
2.5%
0.1428571429 1049
 
2.1%
0.2222222222 793
 
1.6%
0.125 698
 
1.4%
0.1818181818 683
 
1.4%
0.2857142857 638
 
1.3%
Other values (894) 37409
74.8%
ValueCountFrequency (%)
0 2985
6.0%
0.01612903226 1
 
< 0.1%
0.01639344262 1
 
< 0.1%
0.02040816327 2
 
< 0.1%
0.02083333333 2
 
< 0.1%
0.02173913043 1
 
< 0.1%
0.02222222222 1
 
< 0.1%
0.02325581395 1
 
< 0.1%
0.02380952381 1
 
< 0.1%
0.025 3
 
< 0.1%
ValueCountFrequency (%)
1 189
0.4%
0.935483871 1
 
< 0.1%
0.8823529412 1
 
< 0.1%
0.875 1
 
< 0.1%
0.8571428571 1
 
< 0.1%
0.8571428571 1
 
< 0.1%
0.8571428571 2
 
< 0.1%
0.8333333333 2
 
< 0.1%
0.8181818182 1
 
< 0.1%
0.8 10
 
< 0.1%

Sheet_fraction
Real number (ℝ)

Zeros 

Distinct1001
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2576687
Minimum0
Maximum1
Zeros2367
Zeros (%)4.7%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:25.535724image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.051282051
Q10.1875
median0.25
Q30.32
95-th percentile0.45454545
Maximum1
Range1
Interquartile range (IQR)0.1325

Descriptive statistics

Standard deviation0.12874943
Coefficient of variation (CV)0.49967043
Kurtosis6.8258445
Mean0.2576687
Median Absolute Deviation (MAD)0.066326531
Skewness1.3772987
Sum12883.435
Variance0.016576416
MonotonicityNot monotonic
2025-07-29T14:15:25.636598image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2367
 
4.7%
0.3333333333 1857
 
3.7%
0.25 1780
 
3.6%
0.2 1258
 
2.5%
0.2857142857 891
 
1.8%
0.1666666667 885
 
1.8%
0.5 858
 
1.7%
0.2222222222 732
 
1.5%
0.3 627
 
1.3%
0.1428571429 622
 
1.2%
Other values (991) 38123
76.2%
ValueCountFrequency (%)
0 2367
4.7%
0.02173913043 1
 
< 0.1%
0.02325581395 1
 
< 0.1%
0.0243902439 1
 
< 0.1%
0.025 1
 
< 0.1%
0.02857142857 2
 
< 0.1%
0.02941176471 2
 
< 0.1%
0.0303030303 1
 
< 0.1%
0.03125 4
 
< 0.1%
0.03333333333 3
 
< 0.1%
ValueCountFrequency (%)
1 322
0.6%
0.875 1
 
< 0.1%
0.8666666667 1
 
< 0.1%
0.8333333333 1
 
< 0.1%
0.8333333333 7
 
< 0.1%
0.8181818182 1
 
< 0.1%
0.8181818182 1
 
< 0.1%
0.8 17
 
< 0.1%
0.7857142857 1
 
< 0.1%
0.75 61
 
0.1%

Reduced_coefficient
Real number (ℝ)

High correlation  Zeros 

Distinct80
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5004.6842
Minimum0
Maximum45490
Zeros13696
Zeros (%)27.4%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:25.793109image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2980
Q37450
95-th percentile15470
Maximum45490
Range45490
Interquartile range (IQR)7450

Descriptive statistics

Standard deviation5526.0936
Coefficient of variation (CV)1.1041843
Kurtosis2.6617344
Mean5004.6842
Median Absolute Deviation (MAD)2980
Skewness1.4823222
Sum2.5023421 × 108
Variance30537710
MonotonicityNot monotonic
2025-07-29T14:15:25.952792image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 13696
27.4%
1490 8108
16.2%
2980 4944
 
9.9%
6990 3007
 
6.0%
5500 2959
 
5.9%
4470 2774
 
5.5%
8480 2635
 
5.3%
9970 1726
 
3.5%
5960 1472
 
2.9%
12490 1056
 
2.1%
Other values (70) 7623
15.2%
ValueCountFrequency (%)
0 13696
27.4%
1490 8108
16.2%
2980 4944
 
9.9%
4470 2774
 
5.5%
5500 2959
 
5.9%
5960 1472
 
2.9%
6990 3007
 
6.0%
7450 660
 
1.3%
8480 2635
 
5.3%
8940 313
 
0.6%
ValueCountFrequency (%)
45490 2
< 0.1%
44920 1
 
< 0.1%
44000 1
 
< 0.1%
41940 1
 
< 0.1%
40450 1
 
< 0.1%
39990 1
 
< 0.1%
38960 1
 
< 0.1%
38500 1
 
< 0.1%
38390 1
 
< 0.1%
37930 3
< 0.1%

Oxidized_coefficient
Real number (ℝ)

High correlation  Zeros 

Distinct227
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5023.2342
Minimum0
Maximum45490
Zeros13220
Zeros (%)26.4%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-29T14:15:26.405042image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2980
Q37575
95-th percentile15720
Maximum45490
Range45490
Interquartile range (IQR)7575

Descriptive statistics

Standard deviation5536.0772
Coefficient of variation (CV)1.1020942
Kurtosis2.6467933
Mean5023.2342
Median Absolute Deviation (MAD)2980
Skewness1.4791047
Sum2.5116171 × 108
Variance30648150
MonotonicityNot monotonic
2025-07-29T14:15:26.569801image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 13220
26.4%
1490 7426
14.9%
2980 4331
 
8.7%
5500 2688
 
5.4%
6990 2581
 
5.2%
4470 2303
 
4.6%
8480 2157
 
4.3%
9970 1383
 
2.8%
5960 1193
 
2.4%
12490 897
 
1.8%
Other values (217) 11821
23.6%
ValueCountFrequency (%)
0 13220
26.4%
125 407
 
0.8%
250 56
 
0.1%
375 11
 
< 0.1%
500 1
 
< 0.1%
625 1
 
< 0.1%
1490 7426
14.9%
1615 555
 
1.1%
1740 99
 
0.2%
1865 25
 
0.1%
ValueCountFrequency (%)
45490 2
< 0.1%
44920 1
 
< 0.1%
44125 1
 
< 0.1%
42065 1
 
< 0.1%
40575 1
 
< 0.1%
39990 1
 
< 0.1%
38960 1
 
< 0.1%
38515 1
 
< 0.1%
38500 1
 
< 0.1%
37930 3
< 0.1%

Phage_source
Categorical

High correlation 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
IMG_VR
13985 
MGV
12188 
GPD
8837 
GOV2
6091 
TemPhD
4053 
Other values (9)
4846 

Length

Max length8
Median length7
Mean length4.34912
Min length3

Characters and Unicode

Total characters217456
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
IMG_VR 13985
28.0%
MGV 12188
24.4%
GPD 8837
17.7%
GOV2 6091
12.2%
TemPhD 4053
 
8.1%
CHVD 2240
 
4.5%
GVD 853
 
1.7%
RefSeq 521
 
1.0%
PhagesDB 409
 
0.8%
IGVD 408
 
0.8%
Other values (4) 415
 
0.8%

Length

2025-07-29T14:15:26.768800image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
img_vr 13985
28.0%
mgv 12188
24.4%
gpd 8837
17.7%
gov2 6091
12.2%
temphd 4053
 
8.1%
chvd 2240
 
4.5%
gvd 853
 
1.7%
refseq 521
 
1.0%
phagesdb 409
 
0.8%
igvd 408
 
0.8%
Other values (4) 415
 
0.8%

Most occurring characters

ValueCountFrequency (%)
G 42604
19.6%
V 35911
16.5%
M 26180
12.0%
D 16840
 
7.7%
R 14506
 
6.7%
I 14393
 
6.6%
_ 13985
 
6.4%
P 13299
 
6.1%
O 6091
 
2.8%
2 6091
 
2.8%
Other values (19) 27556
12.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 217456
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 42604
19.6%
V 35911
16.5%
M 26180
12.0%
D 16840
 
7.7%
R 14506
 
6.7%
I 14393
 
6.6%
_ 13985
 
6.4%
P 13299
 
6.1%
O 6091
 
2.8%
2 6091
 
2.8%
Other values (19) 27556
12.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 217456
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 42604
19.6%
V 35911
16.5%
M 26180
12.0%
D 16840
 
7.7%
R 14506
 
6.7%
I 14393
 
6.6%
_ 13985
 
6.4%
P 13299
 
6.1%
O 6091
 
2.8%
2 6091
 
2.8%
Other values (19) 27556
12.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 217456
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 42604
19.6%
V 35911
16.5%
M 26180
12.0%
D 16840
 
7.7%
R 14506
 
6.7%
I 14393
 
6.6%
_ 13985
 
6.4%
P 13299
 
6.1%
O 6091
 
2.8%
2 6091
 
2.8%
Other values (19) 27556
12.7%

Function_Prediction_source
Categorical

High correlation  Missing 

Distinct3
Distinct (%)< 0.1%
Missing27130
Missing (%)54.3%
Memory size2.8 MiB
-
12421 
eggNOG-mapper
8735 
Iterative search
1714 

Length

Max length16
Median length1
Mean length6.707477
Min length1

Characters and Unicode

Total characters153400
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st roweggNOG-mapper
2nd row-
3rd roweggNOG-mapper
4th row-
5th roweggNOG-mapper

Common Values

ValueCountFrequency (%)
- 12421
24.8%
eggNOG-mapper 8735
 
17.5%
Iterative search 1714
 
3.4%
(Missing) 27130
54.3%

Length

2025-07-29T14:15:26.955726image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-29T14:15:27.114965image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
12421
50.5%
eggnog-mapper 8735
35.5%
iterative 1714
 
7.0%
search 1714
 
7.0%

Most occurring characters

ValueCountFrequency (%)
e 22612
14.7%
- 21156
13.8%
g 17470
11.4%
p 17470
11.4%
a 12163
7.9%
r 12163
7.9%
G 8735
 
5.7%
O 8735
 
5.7%
N 8735
 
5.7%
m 8735
 
5.7%
Other values (8) 15426
10.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 153400
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 22612
14.7%
- 21156
13.8%
g 17470
11.4%
p 17470
11.4%
a 12163
7.9%
r 12163
7.9%
G 8735
 
5.7%
O 8735
 
5.7%
N 8735
 
5.7%
m 8735
 
5.7%
Other values (8) 15426
10.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 153400
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 22612
14.7%
- 21156
13.8%
g 17470
11.4%
p 17470
11.4%
a 12163
7.9%
r 12163
7.9%
G 8735
 
5.7%
O 8735
 
5.7%
N 8735
 
5.7%
m 8735
 
5.7%
Other values (8) 15426
10.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 153400
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 22612
14.7%
- 21156
13.8%
g 17470
11.4%
p 17470
11.4%
a 12163
7.9%
r 12163
7.9%
G 8735
 
5.7%
O 8735
 
5.7%
N 8735
 
5.7%
m 8735
 
5.7%
Other values (8) 15426
10.1%

Interactions

2025-07-29T14:15:18.243291image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.251863image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.202737image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.109208image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.032446image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.211663image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:12.150912image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.678073image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.842127image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.813572image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.097132image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.341491image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.327539image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.298122image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.181231image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.116231image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.281967image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:12.236120image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.751360image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.922208image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.902068image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.168870image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.443692image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.390897image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.406775image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.257061image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.198349image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.347032image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:12.309276image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.821720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.052592image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.974456image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.238551image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.547959image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.478042image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.511886image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.326472image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.573900image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.415603image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:12.410329image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.094366image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.161870image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.066716image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.309109image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.683292image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.558436image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.587504image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.400069image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.708007image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.485983image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:12.556129image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.202382image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.274827image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.204668image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.394190image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.828076image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.654637image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.668363image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.470429image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.779627image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.555117image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.226312image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.291693image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.349808image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.385037image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.470450image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.960711image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.765520image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.756757image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.543140image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.860305image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.632504image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.304273image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.384616image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.426128image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.519572image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.548844image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:19.049635image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.861624image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.823601image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.639782image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.929954image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.741072image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.379763image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.467798image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.501095image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.641098image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.626211image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:19.137592image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:07.952554image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.896460image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.730563image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:10.996196image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.856962image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.456498image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.569979image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.578084image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.728312image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:17.962329image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:19.226505image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.047094image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.971967image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.838140image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.062904image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.959340image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.531286image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.658462image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.670529image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.846109image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.061401image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:19.309286image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:08.136233image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.039908image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:09.942596image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:11.140735image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:12.052701image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:13.604390image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:14.756165image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:15.739712image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:16.978772image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-29T14:15:18.154889image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-07-29T14:15:27.195818image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
AromaticityFunction_Prediction_sourceFunction_prediction_sourceHelix_fractionInstability_indexIsoelectric_pointMolecular_weightOxidized_coefficientPhage_sourceProtein_sourceReduced_coefficientSheet_fractionStartStopStrandTurn_fraction
Aromaticity1.0000.0110.0230.464-0.010-0.0100.2080.6000.0200.0100.603-0.2330.0400.0390.001-0.039
Function_Prediction_source0.0111.0000.0000.0390.0160.0450.0310.0000.3331.0000.0000.0600.0660.0650.0120.072
Function_prediction_source0.0230.0001.0000.0390.0020.0210.0220.0120.8221.0000.0120.0010.0950.0940.0530.028
Helix_fraction0.4640.0390.0391.000-0.136-0.0530.0680.2480.0160.0000.252-0.0620.0420.0410.013-0.196
Instability_index-0.0100.0160.002-0.1361.000-0.0360.1710.0850.0000.0000.0810.1470.0030.0020.0100.016
Isoelectric_point-0.0100.0450.021-0.053-0.0361.0000.0500.0250.0220.0000.026-0.287-0.006-0.0060.0060.002
Molecular_weight0.2080.0310.0220.0680.1710.0501.0000.6290.0120.0030.6220.0350.0130.0110.0000.019
Oxidized_coefficient0.6000.0000.0120.2480.0850.0250.6291.0000.0130.0090.998-0.1100.0250.0240.0000.005
Phage_source0.0200.3330.8220.0160.0000.0220.0120.0131.0001.0000.0140.0230.0770.0780.0600.020
Protein_source0.0101.0001.0000.0000.0000.0000.0030.0091.0001.0000.0100.0000.0790.0790.0410.000
Reduced_coefficient0.6030.0000.0120.2520.0810.0260.6220.9980.0140.0101.000-0.1080.0250.0230.0000.004
Sheet_fraction-0.2330.0600.001-0.0620.147-0.2870.035-0.1100.0230.000-0.1081.000-0.021-0.0250.017-0.333
Start0.0400.0660.0950.0420.003-0.0060.0130.0250.0770.0790.025-0.0211.0000.9990.005-0.010
Stop0.0390.0650.0940.0410.002-0.0060.0110.0240.0780.0790.023-0.0250.9991.0000.000-0.005
Strand0.0010.0120.0530.0130.0100.0060.0000.0000.0600.0410.0000.0170.0050.0001.0000.008
Turn_fraction-0.0390.0720.028-0.1960.0160.0020.0190.0050.0200.0000.004-0.333-0.010-0.0050.0081.000

Missing values

2025-07-29T14:15:19.474165image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-29T14:15:19.772881image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-07-29T14:15:20.236894image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Phage_IDProtein_sourceFunction_prediction_sourceStartStopStrandProtein_IDProductProtein_classificationMolecular_weightAromaticityInstability_indexIsoelectric_pointHelix_fractionTurn_fractionSheet_fractionReduced_coefficientOxidized_coefficientPhage_sourceFunction_Prediction_source
0NC_011019.1RefSeqRefSeq5145751993-YP_001994544.1hypothetical proteinhypothetical;4439.97690.07894740.9263165.1345730.2368420.1315790.39473744704470RefSeqNaN
1NC_008723.1RefSeqRefSeq86389177+YP_950675.1major tail proteininfection;4196.43070.10256436.1076924.0500280.2051280.3333330.3333331398013980RefSeqNaN
2NC_017968.1RefSeqRefSeq3067331647-YP_006382285.1major capsid proteinassembly;4879.57200.04545548.9113646.0374610.2727270.1136360.34090900RefSeqNaN
3NC_009820.1RefSeqRefSeq5226152434+YP_001469324.1hypothetical proteinhypothetical;6640.54660.05263269.2929824.9061940.1929820.2631580.3508771249012615RefSeqNaN
4NC_018863.1RefSeqRefSeq7976280397-YP_006908410.1RNA polymerase sigma factorreplication;75.06660.0000000.0000005.5250000.0000001.0000000.00000000RefSeqNaN
5NC_021309.1RefSeqRefSeq4573446045+YP_008052003.1hypothetical proteinhypothetical;3862.37160.12121218.0090916.7450490.2727270.1212120.15151584808605RefSeqNaN
6NC_019543.1RefSeqRefSeq125465126616-YP_007010876.1RNA ligase and tail fiber protein attachment catalystlysis;replication;infection;4002.42030.15151546.8121214.7511370.3333330.0909090.3030301146011460RefSeqNaN
7NC_014636.1RefSeqRefSeq213015214970+YP_003969626.1baseplate wedge subunitinfection;2521.88800.09523825.3619056.1432380.3809520.1428570.23809514901490RefSeqNaN
8NC_019725.1RefSeqRefSeq3376634455+YP_007112724.1membrane associated proteinassembly;2026.22800.05263215.8368425.8396610.2105260.2631580.21052614901490RefSeqNaN
9NC_005859.1RefSeqRefSeq4350544143-YP_006913.1tail fiber proteininfection;246.26030.0000005.0000004.5986950.5000000.0000000.50000000RefSeqNaN
Phage_IDProtein_sourceFunction_prediction_sourceStartStopStrandProtein_IDProductProtein_classificationMolecular_weightAromaticityInstability_indexIsoelectric_pointHelix_fractionTurn_fractionSheet_fractionReduced_coefficientOxidized_coefficientPhage_sourceFunction_Prediction_source
49990biochar_1561prodigalNaN2542025884+biochar_1561_32unknownunsorted;1405.55510.000000-10.9642865.2108510.2857140.2857140.14285700STV-
49991biochar_1290prodigalNaN92619680+biochar_1290_21unknownunsorted;6911.08970.00000075.3347838.3510570.2753620.3768120.21739100STV-
49992biochar_3952prodigalNaN1272513378+biochar_3952_16pectinesterase activityunsorted;661.74480.000000-15.6857145.5700170.2857140.2857140.14285700STVIterative search
49993biochar_3049prodigalNaN1537715877+biochar_3049_24unknownunsorted;3524.05370.25925926.4185198.1121370.4444440.0370370.14814899709970STV-
49994biochar_4543prodigalNaN1018710534+biochar_4543_12unknownunsorted;5123.70850.11111162.6422225.3041810.3111110.1777780.37777829802980STV-
49995biochar_4936prodigalNaN1005111601+biochar_4936_9Belongs to the glycosyl hydrolase 28 familyunsorted;2918.13880.1111112.8444446.6595630.2962960.3333330.18518500STVeggNOG-mapper
49996biochar_1323prodigalNaN1540817372-biochar_1323_33phage tail tape measure proteinassembly;infection;2903.25690.12500034.9416678.7917630.3333330.1250000.25000000STVeggNOG-mapper
49997biochar_2347prodigalNaN1762517918+biochar_2347_27unknownunsorted;3046.40220.11111148.2259264.0500280.4444440.2222220.29629629802980STV-
49998biochar_2839prodigalNaN47876559-biochar_2839_5Required for morphogenesis and for the elongation of the flagellar filament by facilitating polymerization of the flagellin monomers at the tip of growing filament. Forms a capping structure, which prevents flagellin subunits (transported through the central channel of the flagellum) from leaking out without polymerization at the distal endunsorted;3199.59590.03333334.9633336.1827980.2333330.1666670.36666700STVeggNOG-mapper
49999biochar_1064prodigalNaN3241532678-biochar_1064_29unknownunsorted;1994.20790.11764783.7647066.3264300.2352940.1176470.41176555005500STV-